22 Sep 2018

The Curious Case of SSAS Cube Size

What determines the SSAS cube Size? I was pretty sure about it's answer until I came across a unique scenario while working with one of the client.

So scenario goes like this we were implementing a cube which had 5 facts and 6 dimensions and we decided to implement it using star schema. I was noting down the size of cube at every stage when it was in final stage the requirement got modified and we were asked to combine all the fact views into a single fact view. Each of the fact was containing almost same attributes with couple of variations and each of them had around 15 millions of records on the other hand each dimension were containing somewhere between 4000 to 8000 records. So when we checked the size of cube before modifying our approach it's size was around 900 MB. Which was pretty acceptable since we were following a full flash MOLAP Storage Mode. Thus we were quite happy with it's size.

Now, when we modified our approach by appending all the facts view into a single fact view keeping every other thing same just adding a new dimension to keep track of records source. At first I was expecting the size of cube to go down because of reductions in number index since earlier there were around 5x6 = 30 indexes to maintain so there is definite reduction in indexing files. I will soon be publishing a new blog on SSAS storage file structure containing details about various type of file SSAS created for storage. Coming back to the story, So I was expecting a decrease in the cube size but... but interestingly it went the other way around size got shoot up from 900 MB to 6.2 GB and I was puzzled.

At first we thought it might be due to increase in the data type for each column since when we combined all the fact views. The size of each column was set to max size for that column out of all the facts(This actually is an example of inconsitency in data warehouse but I will cover this in some other blog). So most of the numeric column got upgraded to bigint and int from int and tinyint taking the variation of size in the picture it's increase from 4 bytes of int to 8 bytes of bigint and multiplying that by around 80 millions records and converting in MB e.g 80000000*4 = 320000000/10^6 = 320MB. But still this was not the things that was affecting the size because even after correcting this, size did not came down that much.
Then I started my investigation and frankly there is no material available on internet that had its answer. So I searched and searched and searched. And after sometime few things came into my perspective which still is not the prefect explanation but might be close to the prefect answer. So below are the few things which matters in the size of cube.

1. Amount of Space For mapping
if you have ever studied Database then you must be aware about the paging concept. Database store everything on pages level but as I said we were using MOLAP model which means all the data was stored in SSAS not in database but it seems SSAS also took some queues from database. So there is increase in the mapping space when number of records increases in a single fact.

2. Way SSAS indexes data is also responsible when in the same file amount of records increases then the size of indexes required to keep track of the columns also increases more rapidly.

Aside from these 2 solid things there are few other points also but those are just my assumption so I will rather not write them here.
There can be some other reasons also which might be responsible for increase in size. So if you are aware about any of those please do share.